From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering
نویسندگان
چکیده
We present an improved method for clustering in the presence of very limited supervisory information, given as pairwise instance constraints. By allowing instance-level constraints to have spacelevel inductive implications, we are able to successfully incorporate constraints for a wide range of data set types. Our method greatly improves on the previously studied constrained -means algorithm, generally requiring less than half as many constraints to achieve a given accuracy on a range of real-world data, while also being more robust when over-constrained. We additionally discuss an active learning algorithm which increases the value of constraints even further.
منابع مشابه
Extracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering
Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...
متن کاملWised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge
The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...
متن کاملConstraints to Farmers Willingness to Pay for Private Irrigation Delivery in Nandom, Ghana
The study investigated the constraints to farmers’ intention to pay for private irrigation in Nandom District, Ghana. Using a key informant interviews and semi-structured questionnaires, the study collected data from 236 farmers. Data was analyzed with descriptive and inferential statistics. Kendall coefficient of concordance was used to determine the level of agreement among farmers in ranking...
متن کاملClustering with Instance-level Constraints
Clustering algorithms conduct a search through the space of possible organizations of a data set. In this paper, we propose two types of instance-level clustering constraints – must-link and cannot-link constraints – and show how they can be incorporated into a clustering algorithm to aid that search. For three of the four data sets tested, our results indicate that the incorporation of surpris...
متن کاملInstance-Level Constraints in Density-Based Clustering
Clustering data into meaningful groups is one of most important tasks of both artificial intelligence and data mining. In general, clustering methods are considered unsupervised. However, in recent years, so-named constraints become more popular as means of incorporating additional knowledge into clustering algorithms. Over the last years, a number of clustering algorithms employing different t...
متن کامل